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1 Contributed articles: "In vivo" spam filterin g : a challenge problem for KDD 
Tom Fawcett 

December 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(260.66 KB) Additional Information: full citation , abstract , references , citings 

Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email 
communication. Many data mining researchers have addressed the problem of detecting 
spam, generally by treating It as a static text classification problem. True in vivo spam 
filtering has characteristics that make it a rich and challenging domain for data mining. 
Indeed, real-world datasets with these characteristics are typically difficult to acquire and 
to share. This paper demonstrates some of these characteri ... 

Keywords: challenge problems, class skew, concept drift, cost-sensitive learning, data 
streams, imbalanced data, spam, text classification 



2 Invited workshop on conceptual information retrieval and clustering of documents: * j§H 
Spam filters: bayes vs. chi-squared; letters vs. words 
Cormac O'Brien, Carl Vogel 

September 2003 Proceedings of the 1st international symposium on Information and 
communication technologies ISICT '03 

Publisher: Trinity College Dublin 

Full text available:^) pdf(93. 10 KB) Additional Information: full citation , abstract , references , citings 

We compare two statistical methods for identifying spam or junk electronic mail. Spam 
filters are classifiers which determine whether an email is junk or not. The proliferation of 
spam email has made electronic filtering vitally important. The magnitude of the problem 
is discussed. We examine the Naive Bayesian method in relation to the 'Chi by degrees of 
Freedom' approach, the latter used in the field of authorship identification. Both methods 
produce very promising results. However, the ... 



3 Features: Spam, Spam. Spam. Spam. Spam, the FTC, and Spam 
&v Eric Allman 

September 2003 Queue, Volume l issue 6 

Publisher: ACM Press 

Full text available: ^ pdfM .28 MB ) gj] 
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A forum sponsored by the FTC highlights just how bad spam isand and how its only going 
to get worse without some intervention. 

The Federal Trade Commission (FTC) held a forum on spam in Washington, D.C., April 30 
to May 2. Rather to my surprise, it was a really good, content-full event. The FTC folks 
had done their homework and had assembled panelists that ran the gamut from ardent 
anti-spammers all the way to hard-core spammers and everyone in between: lawyers, 
legitimate mar ... 

Ending spam's free ride 
Aaron Weiss 

June 2003 netWorker, Volume 7 Issue 2 
Publisher: ACM Press 

Full text available' Wi pdf(84,83 KB) 

Iff ? html(22.89 KB) Additional Information: full citation , abstract , citings , index terms 

Spam is now a felony in the state of Virginia, as long as the unsolicited messages contain 
falsified information about the sender. With the harshest anti-spam law in the United 
States recently passed, Virginia— home to major ISPs such as America Online— is hoping 
to prove that legislation is finally providing the tools needed to cap the flow of bulk e-mail 
advertising. But to skeptics even the strongest anti-spam laws will be hobbled by long 
processing times and fuzzy jurisdiction. Rather, arg ... 

Spam! 

Lorrie Faith Cranor, Brian A. LaMacchia 

August 1998 Communications of the ACM, Volume 41 issue 8 

Publisher: ACM Press 

Full text available - 151 pdf(209 22 KB) Add' 1 ' 0 " 3 ' Information: full citation , references , citings, index terms , 
1^3" ' review 
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A statistical approach to the spam problem j 
Gary Robinson 

March 2003 Linux Journal, Volume 2003 issue 107 
Publisher: Specialized Systems Consultants, Inc. 

Full text available: html(24.38 KB) Additional Information: full citation , abstract , citings , index terms 

Can mathematics tell spam apart from legitimate mail? Find out which approaches work 
best in real -world tests. 

SPAM on the menu: the practical use of remote messaging in community care j 
Keith Cheverst, Karen Clarke, Dan Fitton, Mark Rouncefield, Andy Crabtree, Terry Hemmings 
June 2002 ACM SIGCAPH Computers and the Physically Handicapped , Proceedings of 

the 2003 conference on Universal usability CUU '03, issue 73-74 
Publisher: ACM Press 

Full text available- f* 1 ! odf(541 44 KB) Add ' t ' onal Information: full citation , abstract , references , citings , index 
U x v 1 ■ W 6 — 1 = terms 

This paper presents some early design work of the 'Digital Care' project, developing 
technologies to assist care in the community for user groups with different support needs. 
Our focus is on developing a SMS Public Asynchronous Messenger (SPAM) system for SMS 
messaging to a situated display in hostels for ex-psychiatric patients run by a charitable 
Trust. Such settings pose both methodological and design challenges. We face the 
methodological challenge to uncover requirements in such a sensitiv ... 
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Keywords: SMS messaging, community care, cultural probes, ethnography, 
requirements, user workshops 



8 An experimental comparison of naive Bayesian and keyword-based anti-spam 
<|k filtering with personal e-mail messages 

^ Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos, Constantine D. 
Spyropoulos 

July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '00 

Publisher: ACM Press 

Full text available: fgl pdf(811.41 KB) Additional Information: full citation, abstract, references , citings, index 

terms 

The growing problem of unsolicited bulk e-mail, also known as "spam", has generated a 
need for reliable anti-spam e-mail filters. Filters of this type have so far been based 
mostly on manually constructed keyword patterns. An alternative approach has recently 
been proposed, whereby a Naive Bayesian classifier is trained automatically to detect 
spam messages. We test this approach on a large collection of personal e-mail messages, 
which we make publicly available in "encrypte ... 

Keywords: evaluation (general), filtering&slash;routing, machine learning and IR, test 
collections, text categorization 



9 Accepted Panels: Spam, spam, spam, spam: how can we stop it 
Jenny Preece, Jonathan Lazar, Elizabeth Churchill, Hans de Graaff, Batya Friedman, Joseph 
Konstan 

April 2003 CHI '03 extended abstracts on Human factors in computing systems CHI 
•03 

Publisher: ACM Press 

Full text available: ^)pdf (181.74 KB) Additional Information: full citation , abstract , references , index terms 

How do we keep our channels of electronic communication, both individual and group, 
open, while keeping out inappropriate and unrelated materials, such as spam? Does 
someone other than the intended recipient have the right to control what electronic mail 
users see? Might this lead to censorship? If others DO have the right to control what e- 
mail users see, how should this filtering or censorship occur? Are users aware of this 
filtering? If others are NOT controlling what users receive, what can ... 

10 Inside risks: Spam wars 
Lauren Weinstein 

August 2003 Communications of the ACM, Volume 46 issue 8 
Publisher: ACM Press 

Full text available* *KI pdf(44.64 KB) 

~f\ .,_ _ ft ' Additional Information: full citation , citings, index terms 
html(7.58 KB) *~ 



11 Gettin g Rid of Spam: Blackmail 
Brandon M. Browning 
March 1998 Linux Journal 
Publisher: Specialized Systems Consultants, Inc. 

Full text available: jj?) html(13.11 KB) Additional Information: full citation , references , index terms 
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12 SPAM: a microcode based tool for tracing operating system events 
Stephen W. Melvin, Yale N. Patt 

December 1987 Proceedings of the 20th annual workshop on Microprogramming 
MICRO 20 

Publisher: ACM Press 

Full text available: flB pdf(405.55 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

We have developed a tool called SPAM (for System Performance Analysis using 
Microcode), based on microcode modifications to a VAX 8600, that traces operating 
system events as a side-effect to normal execution. This trace of interrupts, exceptions, 
system calls and context switches can then be processed to analyze operating system 
behavior for the purpose of debugging, tuning or development. SPAM allows 
measurements to be made on a fully operating UNIX system with little perturbation 
' (typica ... 

13 From the editor: spam, not spam, is the stuff of memories . 
Richard Vernon 

March 2002 Linux Journal, volume 2002 issue 95 
Publisher: Specialized Systems Consultants, Inc. 

Full text available: jjj] html(3.57 KB) Additional Information: full citation , index terms 



14 Stop in the name of spam 

November 1998 Communications of the ACM, Volume 41 issue 11 
Publisher: ACM Press 

Full text available: ^ pdf(223.63 KB) Additional Information: full citation , index terms 



15 The last word: Spam I am?! 
Aaron Weiss 

June 2003 netWorker, Volume 7 issue 2 
Publisher: ACM Press 

Full text available:^ pdf(39.24 KB) , , M . ^ • « 

ljg"html(8.78 KB) Add,tlonal Information: full citation , abstract , index terms 

Several months ago I sent out an e-mail to a couple of thousand people and nearly lost 
my broadband access as a result. I maintain a personal Web site of a. satirical nature 
which has attracted a good deal of traffic and consistently positive feedback over the 
years. Eventually I tried to market a modest product based on the Web site. Far from a 
major retail launch, this was more akin to custom printing a bunch of t-shirts for friends 
and family. Because the nature of the site did not invite a hi ... 

16 CoNLL-2000 Short Papers: Combining text and heuristics for cost-sensitive s pam 
filterin g 

Jose M. Gomez Hidalgo, Manuel Mana Lopez, Enrique Puertas Sanz 

September 2000 Proceedings of the 2nd workshop on Learning language in logic and 
the 4th conference on Computational natural language learning - 
Volume 7 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(337.47 KB) Additional Information: full citation , abstract , references , citings 
Spam filtering is a text categorization task that shows especial features that make it 
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interesting and difficult. First, the task has been performed traditionally using heuristics 
from the domain. Second, a cost model is required to avoid misclassification of legitimate 
messages. We present a comparative evaluation of several machine learning algorithms 
applied to spam filtering, considering the text of the messages and a set of heuristics for 
the task. Cost-oriented biasing and evaluation is pe ... 
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Inside risks: spam, spam, spam! 

Peter G. Neumann, Lauren Weinstein 

June 1997 Communications of the ACM, volume 40 issue 6 

Publisher: ACM Press 



Full text available: 1 



Additional Information: full citation , index terms 



18 Preventing Spams and Relay s 
John Wong 

December 1998 Linux Journal 

Publisher: Specialized Systems Consultants, Inc. 

Full text available: htrnl(15.18 KB) Additional Information: full citation , abstract , index terms 

The smtpd package is a useful mail demon for stopping spam, thereby saving money and 
resources 



19 S pam the spammer 
Gilbert W Held 

March 1998 International Journal of Network Management volume 8 issue 2 
Publisher: John Wiley & Sons, Inc. 

Full text available: ^ pdf(1 1 ,89 KB) Additional Information: full citation , index terms 



20 The effectiveness of task-level parallelism for high-level vision 
y^fev W. Harvey, D. Kalp, M. Tambe, D. McKeown, A. Newell 

V February 1990 ACM SIGPLAN Notices , Proceedings of the second ACM SIGPLAN 

symposium on Principles & practice of parallel programming PPOPP 

'90, Volume 25 Issue 3 
Publisher: ACM Press 

Full text available- f£|pdf(1 78 MB). Additional Information: full citation, abstract , references , citings, index 
' ^ " terms 

Large production systems (rule-based systems) continue to suffer from extremely slow 
execution which limits their utility in practical applications as well as in research settings. 
Most investigations in speeding up these systems have focused on match (or knowledge- 
search) parallelism. Although good speed-ups have been achieved in this process, these 
investigations have revealed the limitations on the total speed-up available from this 
source. This limited speed-up is insufficient to allevi ... 
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